Worst-case Complexity of Cyclic Coordinate Descent: $O(n^2)$ Gap with Randomized Version

نویسندگان

  • Ruoyu Sun
  • Yinyu Ye
چکیده

This paper concerns the worst-case complexity of Gauss-Seidel method for solving a positive semidefinite linear system; or equivalently, that of cyclic coordinate descent (C-CD) for minimizing a convex quadratic function. The known provable complexity of C-CD can be O(n) times slower than gradient descent (GD) and O(n) times slower than randomized coordinate descent (R-CD). However, these gaps seem rather puzzling since so far they have not been observed in practice; in fact, C-CD usually converges much faster than GD and sometimes comparable to R-CD. Thus some researchers believe the gaps are due to the weakness of the proof, but not that of the C-CD algorithm itself. In this paper we show that the gaps indeed exist. We prove that there exists an example for which CCD takes at least Õ(nκ) or Õ(nκCD) operations, where κ is the condition number, κCD is a well-studied quantity that determines the convergence rate of R-CD, and Õ(·) hides the dependency on log 1/ . This implies that C-CD can indeed be O(n) times slower than GD, and O(n) times slower than R-CD in the worst case. Our result establishes one of the few examples in continuous optimization that demonstrates a large gap between a deterministic algorithm and its randomized counterpart. Based on the example, we establish several almost tight complexity bounds of C-CD for quadratic problems. One difficulty with the analysis of the constructed example is that the spectral radius of a non-symmetric iteration matrix does not necessarily constitutes a lower bound for the convergence rate.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

When Cyclic Coordinate Descent Outperforms Randomized Coordinate Descent

Coordinate descent (CD) method is a classical optimization algorithm that has seen a revival of interest because of its competitive performance in machine learning applications. A number of recent papers provided convergence rate estimates for their deterministic (cyclic) and randomized variants that differ in the selection of update coordinates. These estimates suggest randomized coordinate de...

متن کامل

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

In this paper we develop a randomized block-coordinate descent method for minimizing the sum of a smooth and a simple nonsmooth block-separable convex function and prove that it obtains an ε-accurate solution with probability at least 1− ρ in at most O((n/ε) log(1/ρ)) iterations, where n is the number of blocks. This extends recent results of Nesterov [Efficiency of coordinate descent methods o...

متن کامل

Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences

Randomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling withand without-replacement in such algorithms. Focusing on least means squares...

متن کامل

Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences

Randomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling withand without-replacement in such algorithms. Focusing on least means squares...

متن کامل

Average/Worst-Case Gap of Quantum Query Complexities by On-Set Size

This paper considers the query complexity of the functions in the family FN,M of N -variable Boolean functions with onset size M , i.e., the number of inputs for which the function value is 1, where 1 ≤ M ≤ 2N/2 is assumed without loss of generality because of the symmetry of function values, 0 and 1. Our main results are as follows: • There is a super-linear gap between the average-case and wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1604.07130  شماره 

صفحات  -

تاریخ انتشار 2016